Load packages

suppressPackageStartupMessages(require(tidyverse))
suppressPackageStartupMessages(require(gapminder))

View first few rows of dataset

head(gapminder)
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

Inspect what are the continents

unique(gapminder$continent)
## [1] Asia     Europe   Africa   Americas Oceania 
## Levels: Africa Americas Asia Europe Oceania

Part 1: Factor Management

Concrete information of the data before removing Oceania

str(gapminder)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Concrete information of the data after removing Oceania

gapminder %>%
  filter(continent != "Oceania") %>% 
  droplevels() %>% 
  str()
## Classes 'tbl_df', 'tbl' and 'data.frame':    1680 obs. of  6 variables:
##  $ country  : Factor w/ 140 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 4 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

We can tell from above that * Number of rows decreased from 1704 to 1680. * Number of countries decreased from 142 to 140. * Number of continents decreased from 5 to 4.

suppressPackageStartupMessages(require(forcats))

Now arrange factors in order. Compared with the the scatter plot later, this has no effect on the plot.

gapminder %>% 
  arrange(gdpPercap) %>% 
  ggplot(aes(log(gdpPercap),lifeExp,color=continent)) + geom_point()

There are two countries in Oceania, and they are Australia and Newzealand. It is also obvious that Oceania is removed.

gapminder %>% 
  filter(continent == "Oceania")
## # A tibble: 24 x 6
##    country   continent  year lifeExp      pop gdpPercap
##    <fct>     <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Australia Oceania    1952    69.1  8691212    10040.
##  2 Australia Oceania    1957    70.3  9712569    10950.
##  3 Australia Oceania    1962    70.9 10794968    12217.
##  4 Australia Oceania    1967    71.1 11872264    14526.
##  5 Australia Oceania    1972    71.9 13177000    16789.
##  6 Australia Oceania    1977    73.5 14074100    18334.
##  7 Australia Oceania    1982    74.7 15184200    19477.
##  8 Australia Oceania    1987    76.3 16257249    21889.
##  9 Australia Oceania    1992    77.6 17481977    23425.
## 10 Australia Oceania    1997    78.8 18565243    26998.
## # ... with 14 more rows

Part 2: File I/O

Save mean lifeExp into mean_life_exp.csv

(mean_life_exp = gapminder %>% 
  group_by(country) %>% 
  summarise(mu = mean(lifeExp)))
## # A tibble: 142 x 2
##    country        mu
##    <fct>       <dbl>
##  1 Afghanistan  37.5
##  2 Albania      68.4
##  3 Algeria      59.0
##  4 Angola       37.9
##  5 Argentina    69.1
##  6 Australia    74.7
##  7 Austria      73.1
##  8 Bahrain      65.6
##  9 Bangladesh   49.8
## 10 Belgium      73.6
## # ... with 132 more rows
write_csv(mean_life_exp, "mean_life_exp.csv")

Part 3: Visualization Design

Load package plotly

suppressPackageStartupMessages(require(plotly))

A scatter-plot of lifeExp vs log(gdpPercap) colored with continent

(p = ggplot(gapminder, aes(log(gdpPercap), lifeExp, color=continent)) + geom_point())

Here is the same plot in plotly. This allows each data to be inspected in details in its values and corresponding continent. This is especially helpful in this case since data are clumped together.

plot_ly(gapminder, x=~log(gdpPercap), y=~lifeExp, color=~continent)
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

Part 4: Writing Figures to File

Save the above scatter-plot to file

ggsave("scatter_plot.png", plot=p)
## Saving 7 x 5 in image

Specifying plot=p matters when you made mulitple plots and you just want to save one of them

p0 = ggplot(gapminder, aes(lifeExp)) + geom_histogram()
ggsave("scatter_plot0.png", plot=p)
## Saving 7 x 5 in image
ggsave("hist0.png", plot=p0)
## Saving 7 x 5 in image
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Here we embed the scatter plot into this document scatter